Spectral-like conjugate gradient methods with sufficient descent property for vector optimization

Several conjugate gradient (CG) parameters resulted in promising methods for optimization problems. However, it turns out that some of these parameters, for example, ‘PRP,’ ‘HS,’ and ‘DL,’ do not guarantee sufficient descent of the search direction. In this work, we introduce new spectral-like CG methods that achieve sufficient descent property independently of any line search (LSE) and for arbitrary nonnegative CG parameters. We establish the global convergence of these methods for four different parameters using Wolfe LSE. Our algorithm achieves this without regular restart and assumption of convexity regarding the objective functions. The sequences generated by our algorithm identify points that satisfy the first-order necessary condition for Pareto optimality. We conduct computational experiments to showcase the implementation and effectiveness of the proposed methods. The proposed spectral-like methods, namely nonnegative SPRP, SHZ, SDL, and SHS, exhibit superior performance based on their arrangement, outperforming HZ and SP methods in terms of the number of iterations, function evaluations, and gradient evaluations.


Introduction
In recent times, the successful application of CG methods in solving vector optimization problems (VOPs) has attracted considerable attention as detailed in [1].Since then, these approaches have gained recognition for their simplicity and minimal memory requirements, thereby proving effective (see, for example, [2,3] and their references).
Another crucial iterative technique for optimization is the spectral gradient method introduced in [14].This method has substantial performance.Later, in [15] the spectral gradient and CG method were combined to give the first spectral conjugate gradient (SCG) method.The method used the following search direction with spectral parameter θ k−1 and β k , see [15].The SCG method has been extensively investigated by several authors, including spectral CG with sufficient descent property [16,17], spectral CG involving RMIL [18], and self-adjusting spectral involving hybrid DL CG [19].Most of the parameters mentioned above are considered for VOP, in which the objective function F represents a function that maps vectors from an n-dimensional space of real numbers to vectors in an m-dimensional space of real numbers.F is also assumed to be a continuously differentiable function.Let Q be a subset of the m-dimensional space of real numbers, closed, convex, and a pointed cone with a nonempty interior.The unconstrained VOP is defined as

VOP :
Opt Q FðtÞ; ð2Þ which optimizes F, where t 2 R n : Note that, F = (F 1 , � � �, F m ) T for some F i : R n !R; i = 1, � � �, m.Moreover, if Q is the m-dimensional space of real numbers with nonnegative components, then VOP reduces to multi-objective optimization (MOO).Also, if Q consists of only nonnegative real numbers and m = 1, then (2) reduces to single-objective optimization (SOO).Several applications in industry and finance are considered instances of VOP, where multiple objective functions are optimized concurrently.Consequently, it becomes imperative to determine a set of optimal points for VOP [20][21][22][23][24][25][26].Because a total order is lacking in R m where m � 2, the solution to VOP consists of a set of non-dominated points, often referred to as Pareto optimal or efficient points.The challenge lies in identifying the solutions that strike the most advantageous balance.It is important to mention that, (2) signifies minimizing F with respect to the ordering cone Q.
One way to approach VOPs is the scalarization techniques, which parameterized singleobjective optimization problems to yield Pareto-optimal points.The decision-maker must choose the parameters, as they are not predetermined.Making this choice can pose significant challenges or become impossible for some problems [27][28][29].Consequently, to overcome these drawbacks, some descent-based algorithms have been suggested as solution approaches for VOP.Thanks to the works in [30,31].Subsequently, numerous other studies have followed this trajectory, exploring similar directions, see the survey on MOO descent methods in [32] and the references [33][34][35][36].
In [1], the conjugate parameters of [4,5,10,12,37] are considered for VOPs.Their study encompassed numerical implementations of these methods, which were analyzed and discussed.Among these methods, as per the considered test problems, the nonnegative PRP and HS showed exceptional performance in comparison to the others.Conversely, the CD and DY methods outperformed FR in terms of efficiency.Thereafter, Goncalves and Prudente [38] extended the Hager-Zhang CG method for VOPs.For this method, the search direction does not guarantee the descent condition, even with an exact LSE.To address this issue, the authors proposed a self-adjusting HZ method, utilizing a sufficiently accurate LSE, which possesses the descent property.Other works in this direction include the work of [39] based on sufficiently accurate LSE, the first hybrid CG methods proposed for VOPs in [3] and some modified CG methods in [40].
Following the works in [15,41,42], He et al. [43] proposed the SCG methods for VOPs.In contrast to scalar optimization, the extension of SCG does not yield a descent property.As a result, the authors provided a modified self-adjusting SCG algorithm to induce the property through the algorithm.They established the convergence using a sufficiently accurate LSE that satisfies the Wolfe LSE.It is therefore natural to ask if there could be an SCG method for which the descent property is guaranteed without inducing it into the algorithm.
In this paper, we affirmatively address the above question.We define a new form of search direction in the vector context, inspired by the work of [44].Our method yields a sufficient descent property independent of any LSE for arbitrary nonnegative conjugate parameters.We consider four of these parameters and establish their convergence using Wolfe LSE.We provide computational experiments to validate our findings.The outcomes here are compared with HZ and SP methods, showing that our proposed methods are promising.
The presentation of the work proceeds as follows: Section 2 discusses the basic notions and preliminaries.Section 3 presents the proposed algorithm and its convergence properties.Section 4 presents and discusses computational experiments, and Section 5 provides substantial remarks.

Preliminaries
This section presents the basic notions related to VOPs.For further details, see [1,31,[45][46][47].Throughout the subsequent sections, (2) signifies minimizing F with respect to the ordering cone Q.Moreover, for a generic Q, the partial order in R m generated by Q, ≼ Q is in the sense that z ≼ Q y , y − z 2 Q, and � Q , is given by z � Q y , y − z 2 int(Q).Moreover, the idea of optimality is substituted with "Pareto-optimal or efficient" and "weak Pareto-optimal or efficient" in VOP.
Definition 2 [32] A vector � t 2 R n is weak Pareto-optimal (weak efficient) if and only if ∄ a vector y 2 R n s.t FðyÞ � Q Fð � tÞ.
Remark 3 Definition 1 implies Definition 2, but the converse is not generally true.
Below are some properties of the cone Q, including its positive polar cone: Notice that Q = Q**, considering the convexity and closedness of Q, The cone formed by W � R m is denoted as cone(W), and conv(W) represents the convex hull of W. Now, consider K � Q*, where 0 = 2 K and K is compact.We define Q* as follows: Now, for a generic set Q, we defined K as follows: which satisfies the condition in (3).In this work, we adopt the definition of K provided in (4).
The Jacobian of F at t is denoted as JF(t) and the image of JF(t) on R m is represented as Image (JF(t)).
The following See, for example [47].Define z : R m !R as zðtÞ ≔ supfht; qi j q 2 Kg: The z is well-defined, since K is compact.We observed that z also provides some features of −Q and −int(Q) as follows: The iterate begins with arbitrary t 0 2 R n and updated where ℓ k > 0 is called the step size, computed via an LSE method, and the search direction d k is defined as with β k as the conjugate parameter.The VOPs versions of parameters proposed in [1] are as follows: where f(�, �) is defined by Eq (6).The vector version of Hager-Zhang was given in [38] as follows: where m > 1  4 and where Now, consider the problem: ð15Þ see for instance, [46].
Here, we define the most commonly known LSE used for conjugate gradient algorithms, namely the exact and inexact LSE, which are defined as follows: we have an exact LSE if ℓ > 0 is computed as follows: As stated in [1], the standard Wolfe LSE and the strong Wolfe LSE for VOPs.It state that ℓ > 0 fulfills the standard Wolfe condition We find ℓ > 0 by means of the strong Wolfe condition (SWC) if the following conditions are satisfied: Lemma 7 [31] Consider h(t) and v(t) as defined in (8) and ( 9) respectively, then we have < 0 and h(t) is a Q-DD, (c). the maps, h and v are continuous.

Spectral-like algorithm and convergence properties
In this section, we present the main algorithm and its convergence properties.However, before delving into details, it is important to consider the following throughout this work: according to Lemma 7 (b), we have Consider the vector version of the Dai-Liao conjugate parameter as follows: where α > 0. Now, we consider the following nonnegative parameters and the iterative: where d k is defined as follows: As mentioned in the preliminary section, ℓ k > 0 is computed via an LSE strategy, The following sufficient descent condition follows from Lemma 6(a) and ( 23) This implies that we always have (7) with c � 1, irrespective of LSE and the β k parameter.Remark 8 It is easy to see that (23) can be expressed as Thus, we now have a spectral CG that achieved (7) without any LSE.Note that when employing an exact LSE here, (23) becomes the well-known nonlinear CG method (11) with d k−1 replaced by s k−1 .
Before we proceed to the convergence analysis, we will require the following significant assumptions. Assumption for all k.By ( 27) and ( 28), we have with kqk = 1.We stress that these assumptions naturally extend those considered in singleobjective optimization.We will need the following Zoutendijk Lemma in our convergence analysis.
Lemma 12 [1] Let Assumptions 9 and 10 hold.Consider (10), with Q-DD d k and ℓ k satisfies (17).Then, Next, we present the spectral-like algorithm for the nonnegative CG methods for VOPs.Algorithm 1: A spectral-like algorithm for VOPs Step 0: Take t 0 2 R n and initialize k 1.
Step 3: Compute d k as defined in (23), where β k is a nonnegative parameter.
Step 4: Set t k+1 = t k + ℓ k d k , for k k + 1 and move to Step 1.

Remark 13 (i)
Step 1 is well explained by Lemma 7.
(iii) In Step 3, we compute d k and utilize any of the conjugate parameters in (21) one at a time and move to the next step, where the iterates are updated continuously.
One of the sufficient conditions for establishing the convergence here is to estimate the norm of its search direction as follows: if k = 0 or f(t k , s k−1 ) < 0, then we have the following estimate Otherwise, we have from (23) that Moreover, we get Note that, by ( 27) and ( 28), we have and Now, applying (33) and (34) in (32), we have Now, using (35) in (31), we have This can further be written as Next, we estimate the modulus of ϕ k by means of a so called property (*), this property was introduced by Gilbert and Nocedal [49] and recently extended in [1].This property(*) indicates that β k is small whenever s k−1 is small.Property (*): Consider Algorithm 1 and assume that there exists � d > 0 s.t Then we have property (*) if there exist p > 1 and λ > 0 s.t and We have by ( 28) and ( 38) that holds with c 1 = p/γ.The result below indicates that, given some mild assumptions, a CG method fulfilling property (*) converges.
Theorem 14 [1] Given a CG Algorithm, assuming that Assumptions 9 and 11 are satisfied for all k, s.t (a) Remark 15 It is evident that standard Wolfe (17) holds whenever strong Wolfe (18) is assumed.Thus, we assume only strong Wolfe condition for the subsequent results.
(iv) b k ¼ maxfb DL k ; 0g: Then, we have By ( 17) and ( 24), we have for all k � 1.Thus, Using ( 44) and ( 43) in ( 51), we get We see by (42) that Therefore, Hence, � ≔ 8dgðLdþagÞ d 4 ð1À sÞ thereby completing the proof.The lemma presented herein aligns with Lemma 6 as documented in [38].It plays an important role in proving the convergence of the proposed CG methods.
Lemma 17 Suppose Assumptions 9 and 11 hold.Consider Algorithm 1, where ℓ k is obtained by (18).If there exists � d > 0 s.t and property (*) holds with β k � 0.Then, d k 6 ¼ 0 and where w k ≔ d k kd k k .Proof Notice that d k 6 ¼ 0, otherwise ( 24) is not true.Thus, w k is well-define.From the fact that d k is Q-DD at t k and ℓ k fulfills (18), we have the Zoutendijk condition (30).Now, applying (52), Lemma 7 (b), and ( 7), we get 23) is rewritten as and by applying (54), we have Now, we have It follows from β k � 0, (55) and the triangle inequality, that 26), we have that ks kÀ 1 k � 2 � M: Moreover, it follows from (28), ( 42) and ( 44) that Also, from (56), we have   By ( 28) and ( 53), we get This complete the proof.Next, we present the main convergence theorem, considering ϕ instead of β k .The proof follows directly from [Theorem 2, p. 905, [38]], and thus, it is omitted.
Theorem 18 Given Algorithm 1 and assuming that Assumptions 9 and 11 hold.Then

Numerical results and discussions
In this section, we evaluate the performance of the proposed spectral-like Algorithm 1 by examining the following methods: PRP+, HS+, HZ, and DL+.We aim to gauge their efficiency and robustness in addressing benchmark test problems sourced from various MOO research articles.The algorithms were coded in Fortran 90.Subsequently, in the context of MOO, we define e as ½1; � � � ; 1� T 2 R m ; Q as R m þ ; and K as fe 1 ; e 2 ; � � � ; e m g � R m .Below, we present a summary of the methods under consideration, including their initial parameter values.This encompasses both our proposed methods and those employed for comparison purposes: • SPRP+: a spectral-like PRP+ method given by Algorithm 1 with β k in (21); • SHS+: a spectral-like HS+ method given by Algorithm 1 with β k in (21); • SHZ+: a spectral-like HZ method given by Algorithm 1 with β k in (21) with μ = 1.0; • SDL+: a spectral-like DL+ method given by Algorithm 1 with β k in ( 21) and α = 0.1.
Our findings are compared with the following CG methods: • HZ: a Hager-Zhang CG algorithm given in [38] with μ = 1.0,; • SP: a spectral CG method (SCG) given in [43].
An essential part of the algorithms include computing the steepest descent direction, h(t).To achieve this, we utilize Algencan to solve problem (15); for more details, refer to [50].In addition, the selection of the step size was performed using a LSE strategy that fulfills (18).The same LSE, employed for both HZ and SP, was used for all the proposed methods.Below are the initial parameters utilized in the LSE procedure for the implementation of the proposed methods: On the other hand, we have by Lemma 7 that t 2 R n is a stationary point if and only if v(t) = 0. Consequently, the experimentation was conducted by running all the implemented method up to the point of convergence, which is assumed to be vðtÞ � À 5 � eps 1 2 ; or whenever, the maximum number of iterations, #maxIt = 5000 is exceeded.In this case, the v(t) is defined by (9) and the machine precision, eps � 2.22 × 10 −16 .
Details of the test problems under consideration are provided in Table 1.The first column provides the names of the problem, for instance, "Lov1" aligning to the first problem introduced by A. Lovison in [51], and "SLCDT1" corresponding to the first problem given by Schu ¨tze, Lara, Coello, Dellnitz, and Talbi in [52].All the remaining problems follow the same pattern with their corresponding references.The second column gives the corresponding references, while the third column is assigned for "n" the number of variables and the fourth column is assigned as "m" the objective functions of the problems.A box constraint was utilized for the starting points, defined as ft 2 R n j � l � t � � ug, where the lower bound is indicated in the fifth column and the upper bound is indicated in the last column.
In Table 2, the results of the proposed algorithms under the respective test problems are presented in comparison with HZ and SP CG methods.All the methods ran 100% successfully or reached a critical point.The table arrangements are: 'Iter,' 'FunEva,' and 'GradEva.'Thus, they denote the median number of iterations, functions, and gradient evaluations, respectively.
In a VOP setting, the primary objective is to approximate the Pareto frontier of the given problem.To achieve this, we employed a methodology where each implemented method underwent 200 runs for each problem, and Iter, FunEva, and GradEva were recorded for each run.The methods began with initialization using uniformly distributed random points within the problem's specified bounds, as detailed in Table 1.The comparison metrics employed here include the Iter, FunEva, and GradEva.
In order to guarantee an equitable and significant comparison of algorithms, we employed the well-established performance profile as documented in [53].The profile visually represents algorithm performance, which compares algorithmic performance across various metrics, offering a comprehensive assessment of efficiency and robustness.This tool enables us to concisely summarize the experimental data showcased in Tables 2 and 3. Furthermore, the

Concluding remarks
We introduced new spectral-like CG methods that achieve sufficient descent property independently of any LSE and for arbitrary nonnegative CG parameters.Four well-known conjugate parameters, PRP+, HS+, HZ+, and DL+ are considered and thus are referred to as SPRP +, SHS+, SHZ+, and SDL+, respectively.We established the convergence of the proposed methods using Wolfe LSE.Our algorithms achieved this without regular restart and assumption of convexity regarding the objective functions.The sequences generated by our algorithm identify points that satisfy the first-order necessary condition for Pareto optimality.We conducted computational experiments which show the implementation and efficiency of the methods with a promising performance.The proposed spectral-like methods, SPRP+, SHZ+, SDL+, and SHS+, exhibited the best performance according to their appearance, outperforming HZ, and SP methods in all the considered metrics, the number of iterations, function, and gradient evaluations.
A challenging task to consider in the future is the three-term CG method, which is of special interest in yielding sufficient descent of the search direction.This work may be challenging, considering that f as defined in ( 6) is only sublinear with respect to the second variable.However, the work herein provides insight into the three-term method.

9
Let Q be composed of a finite number of elements and 9 an open set O s.t L ≔ ft j FðtÞ ≼ Q Fðt 0 Þg � O; where t 0 2 R n and 9 L > 0 s.t kJF(t) − JF(t 0 )k � Lkt − t 0 k for all t, t 0 2 O. Assumption 10 Let fD k g k2N � FðLÞ and D k+1 ≼ Q D k , for all k, then 9 D 2 R m s.t D ≼ Q D k .Assumption 11 The set L ≔ ft j FðtÞ ≼ Q Fðt 0 Þg is bounded.Note that, by Assumption 11 we have that ft k g � L; this implies that, 9 � M > 0 s.t k Þk � g; ð27Þ for all k.Also, by the boundedness of {f(t k , h(t k ))} and Lemma 7(b), there exists δ > 0 s.t khðt k Þk � d;